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(54) Vector quantization method and apparatus 

(57) The present invention relates to a method of en- 
coding a hyper-spectral image datacube using vector 
quantisation. According to the invention, a temporary 
codebook having a small number, n, of codevectors is 
generated from the datacube. The datacube is proc- 
essed using the temporary codebook to form n clusters 
(subsets) of vectors. A codevector corresponds to a 
cluster and is the centre of gravity for the cluster. In the 
compression process, vectors in each cluster are en- 
coded by the corresponding codevector. Then the re- 



construction fidelity of the encoded cluster is evaluated. 
When the fidelity of an encoded cluster is better than a 
predetermined fidelity, the codevector relating to that 
cluster is stored in a final codebook and the vectors in 
the cluster are expressed with the index (address) of the 
codevector in the final codebook. When the fidelity of an 
encoded cluster is not suitable, the cluster is reencoded 
with a new temporary codebook generated from this 
cluster, and the same process is repeated. The com- 
pression process is recursively implemented until all 
clusters are processed. 



CROSS TRACK 




Spectral Band 



Figure 1 



Printed by Jouve, 7S001 PARIS (FR) 



1 



EP 1 209 627 A2 



Description 
Field of the invention 

[0001] The invention relates to data compression and 
more particularly to compression of multidimensional 
data representations using vector quantisation. 

Background of the Invention 

[0002] The next generation of satellite-based remote 
sensing instruments will produce an unprecedented vol- 
ume of data. Imaging spectrometers, also known as hy- 
per-spectral imaging devices, are prime examples. 
They collect image data in hundreds of spectral bands 
simultaneously from the near ultraviolet to the short 
wave infrared, and are capable of providing direct iden- 
tification of surface materials. 

[0003] Hyper-spectral data thus collected are typically 
in the form of a three-dimensional (3D) data cube. Each 
data cube has two dimensions in the spatial domain de- 
fining a rectangular plane of image pixels, and a third 
dimension in the spectral domain defining radiance lev- 
els of multiple spectral bands per each image pixel. The 
volume and complexity of hyper-spectral data present 
a significant challenge to conventional transmission and 
image analysis methods. The raw data rates for trans- 
mitting such data cubes can easily exceed the available 
downlink capacity or on-board storage capacity of exist- 
ing satellite systems. Often, therefore, a portion of the 
data collected on board is discarded before transmis- 
sion, by reducing the duty cycle, reducing the spatial or 
spectral resolution, and/or reducing the spatial or spec- 
tral range. Obviously, in such cases large amounts of 
information are lost. 

[0004] For data processing, a similar problem occurs. 
In computing, a current trend is toward desktop comput- 
ers and Internet based communications. Unfortunately, 
the data cubes require a tremendous amount of storage 
and, for processing functions, the preferred storage is 
random access memory (RAM). Current desktop com- 
puters often lack sufficient resources for data process- 
ing of data cubes comprising spectral data. 
[0005] Recent work related to data compression of 
multi-spectral and hyper-spectral imagery has been re- 
ported in the literature, but most of these studies relate 
to multi-spectral imagery comprised of only a few spec- 
tral bands. These prior art systems for multi-spectral im- 
agery yield small compression ratios, usually smaller 
than 30:1 . There are two reasons for this: 

1 ) the prior art systems do not efficiently remove the 
correlation in the spectral domain, and 

2) the redundancy of multi-spectral imagery in the 
spectral domain is relatively small compared to that 
of hyper-spectral imagery. 



[0006] Gen et al. teach two systems for hyper-spectral 
imagery. The first system uses trellis coded quantisation 
to encode transform coefficients resulting from the ap- 
plication of an 8'8'8 discrete cosine transform. The sec- 
5 ond system uses differential pulse code modulation to 
spectrally decorrelate data, while using a 2D discrete 
cosine transform for spatial decorrelation. These two 
systems are known to achieve compression ratios of 
greater than 70:1 in some instances; however, it is de- 
10 sirable to have higher compression ratios with simpler 
coding structures than those reported in the literature. 
[0007] In an article entitled "Lossy Compression of 
Hyperspectrai Data Using Vector Quantization" by 
Michael Ryan and John Arnold in the journal Remote 
15 Sens. Environ., Elsevier Science Inc., New York, N.Y., 
1997, Vol. 61 , pp. 419-436, an overview of known gen- 
eral vector quantization techniques is presented. The 
article is herein incorporated by reference, in particular, 
the authors describe issues such as distortion measures 
20 and classification issues arising from lossy compression 
of hyper-spectral data using vector quantization. 
[0008] Data compression using Vector Quantisation 
(VQ) has received much attention because of its prom- 
ise of high compression ratio and relatively simple struc- 
25 ture. Unlike scalar quantisation, VQ requires segmen- 
tation of the source data into vectors. Commonly, in two- 
dimensional (2D) image data compression, a block with 
n'm (n may be equal to m) pixels is taken as a vector, 
whose length is equal to n'm. Vectors constituted in this 
30 way have no physical analogue. Because the blocks are 
segmented according to row and column indices of an 
image, the vectors obtained in this manner change at 
random as the pixel patterns change from block to block. 
The reconstructed image shows an explicit blocking ef- 
35 feet for large compression ratios. 

[0009] There are several conventional approaches to 
constituting vectors in a 3D data cube of hyper-spectral 
imagery. The simplest approach is to treat the 3D data 
cube as a set of 2D monochromatic images, and seg- 
40 ment each monochromatic image into vectors inde- 
pendently as in the 2D-image case. This approach , how- 
ever, does not take full advantage of the high correlation 
of data in the spectral domain. There is therefore a need 
for a data compression system that takes advantage of 
45 correlation in the spectral domain and of 2D spatial cor- 
relation between adjacent image pixels. 
[0010] The VQ procedure is known to have two main 
steps: codebook generation and codevector matching. 
VQ can be viewed as mapping a large set of vectors into 
so a small cluster of indexed codevectors forming a code- 
book. During encoding, a search through a codebook is 
performed to find a best codevector to express each in- 
put vector. The index or address of the selected code- 
vector in the codebook is stored associated with the in- 
55 put vector or the input vector location. Given two sys- 
tems having a same codebook, transmission of the in- 
dex to a decoder over a communication channel from 
the first system to the second other system allows a de- 
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coder within the second other system to retrieve the 
same codevector from an identical codebook. This is a 
reconstructed approximation of the corresponding input 
vector. Compression is thus obtained by transmitting the 
index of the codevector rather the codevector itself. 5 
Many existing algorithms for codebook designs are 
available, such as the LBG algorithm reported by Linde, 
Buzo and Gray, the tree-structure codebook algorithm 
reported by Gray, the self organising feature map report- 
ed by Nasrabadi and Feng. Among these, the LBG al- 10 
gorithm is most widely used because of its fidelity. The 
disadvantages of the LGB algorithm are its complexity 
and the time burden taken to form the codebook. When 
the input data is a 3D data cube of hyper-spectral im- 
agery, the processing time can be hundreds of times 1$ 
higher than the normal 2D-image case. 
[0011] It is, therefore, an object of the present inven- 
tion to provide a substantially faster codebook genera- 
tion algorithm with relatively high fidelity for encoding 
hyperspectral data and the like, 20 
[001 2] It is another object of the present invention to 
provide a data compression system for multidimension- 
al data. 



Summary of the Invention 
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[0013] In accordance with the invention there is pro- 
vided, a method for encoding a hyper-spectral image 
datacube using vector quantisation, wherein the encod- 
ed hyper-spectral data is compressed data, the method 30 
comprising the steps of: 



a) determining a codebook having a plurality of 
codevectors; 

b) encoding each spectral vector of the hyper-spec- 
tral data by determining a codevector within the 
codebook that approximates the spectral vector 
within the hyper-spectral data; 

c) determining a fidelity of the encoded hyper-spec- 
tral data; and, 



35 



40 



d) when the fidelity of a cluster of spectral vectors 
encoded by a codevector is below a predetermined 45 
fidelity, determining another codebook relating to a 
subset of at least a codevector within the previously 
determined codebook, encoding each spectral vec- 
tor within the hyper-spectral data of the cluster that 
is associated with codevectors within the subset us- so 
ing the other codebook, and returning to step (c). 

[0014] In accordance with the invention there is pro- 
vided, a method for encoding a hyper-spectral image 
datacube using vector quantisation, wherein the encod- 55 
ed hyper-spectral data is compressed data, the method 
comprising the steps of: 



a) determining a codebook having a plurality of 
codevectors; 

b) encoding each spectral vector of the hyper-spec- 
tral data by determining a codevector within the 
codebook that approximates the spectral vector 
within the hyper-spectra! data; and, 

c) repeating the steps of: 

determining another codebook relating to a 
subset of codevectors within the previously de- 
termined codebook, encoding each spectral 
vector within the hyper-spectral data that is as- 
sociated with codevectors within the subset us- 
ing the other codebook, 

until a desired number of codevectors exists 
within each of the codebooks. 

[0015] In accordance with the invention there is fur- 
ther provided, a system for encoding a hyper-spectral 
image datacube using vector quantisation, wherein the 
encoded hyper-spectra! data is compressed data, the 
system comprising: 

a first port for receiving the hyper-spectral image 
data; 

a suitably programmed processor for: 

a) determining a codebook having a plurality of 
codevectors; 

b) encoding each spectral vector of the hyper- 
spectral data by determining a codevector with- 
in the codebook that approximates the spectral 
vector within the hyper-spectral data; 

c) determining a fidelity of the encoded hyper- 
spectral data; and, 

d) when the fidelity of a cluster of spectral vec- 
tors encoded by a codevector is below a pre- 
determined fidelity, determining another code- 
book relating to a subset of codevectors within 
the previously determined codebook, encoding 
each spectral vector of the cluster of vectors 
within the hyper-spectral data that is associat- 
ed with codevectors within the subset using the 
other codebook, and returning to step (c); 

memory for storing data during execution of steps 
(a) to (d); and, 

a second port for providing the encoded hyper- 
spectral data. 
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[0016] In accordance with the invention there is yet 
further provided, a system for encoding a hyper-spectral 
image datacube using vector quantisation, wherein the 
encoded hyper-spectral data is compressed data, the 
system comprising: 5 

a first port for receiving the hyper-spectral image 
data; 



a suitably programmed processor for: 

a) determining a codebook having a plurality of 
codevectors; 

b) encoding each spectral vector of the hyper- 
spectral data by determining a codevector with- 
in the codebook that approximates the spectral 
vector within the hyper-spectral data; and, 

c) repeating the steps of: 

determining another codebook relating to 
a subset of codevectors within the previ- 
ously determined codebook, encoding 
each spectral vector within the hyper-spec- 
tral data that is associated with codevec- 
tors within the subset using the other code- 
book, 



until a desired number of codevectors ex- so 
ists within each of the codebooks; and 

memory for storing data during execution of steps 
a) to c); and, 
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a second port for providing the encoded hyper- 
spectral data. 

Brief Description of the Drawings 

[0017] Exemplary embodiments of the invention will 
now be described in conjunction with the drawings in 
which: 



Fig. 1 is a perspective view of a hyper-spectral data 45 
cube having two dimensions in the spatial domain 
defining a rectangular plane of image pixels, and a 
third dimension in the spectral domain defining ra- 
diance levels of hyper-spectral bands for each im- 
age pixel; 50 

Fig. 2 is a simplified block diagram of a prior art em- 
bodiment of a system for compressing, communi- 
cating and processing in a compressed form, hyper- 
spectral image data defining radiance levels of mul- 55 
tiple spectral bands per image pixel, said system 
being initially trained with image data defining a pre- 
determined set of training image pixels; 



Fig. 3 is a flow diagram of a preferred embodiment 
of a method for encoding hyper-spectral image data 
according to the invention; 

Fig. 4 is a simplified diagram of the method shown 
in Fig. 3; 

Fig. 5 is simplified diagram of a system for encoding 
hyper-spectral image data according to the inven- 
tion; 

Fig. 6 is simplified diagram of another method for 
encoding hyper-spectral image data according to 
the invention; 

Fig. 7 is simplified diagram of another system for 
encoding hyper-spectral image data according to 
the invention; and, 

Fig. 8 is simplified diagram of another system for 
encoding hyper-spectral image data according to 
the invention. 

Detailed Description of the Invention 

[0018] In spectral imaging using satellite based hy- 
perspectral imagers, there is a tremendous amount of 
data captured that requires transmission to a terrestrial 
base for analysis and other uses. Typically, it is desirable 
to compress the captured data before transmitting same 
to the ground. Unfortunately, this often requires complex 
and expensive compression hardware. Using the meth- 
od of the present invention, it is possible to implement 
a fast data encoding system using commonly available 
processors that operates in real-time and is suitable to 
installation on board a satellite system. 
[0019] Referring now to Fig. 1 , a hyperspectral data 
cube is shown. The data is a set of spectral data cap- 
tured by a hyperspectral imager mounted within a sat- 
ellite. As such, there is a strong correlation between 
spectra of similar items such as snow or water. For ex- 
ample, spectra relating to locations wherein snow is 
found are often similar, but not identical. Encoding of 
these data in order to compress them can be performed 
using a vector quantisation (VQ) method. Referring to 
Fig. 2, a simplified flow diagram of a prior art method of 
performing vector quantisation is shown. A codebook is 
generated from a hyperspectral datacube having a 
known number of codevectors. Generation of a code- 
book is a time-consuming and difficult task since the 
ability of the codevectors within the codebook to approx- 
imate each vector in the datacube is essential in order 
to provide compressed data having good fidelity - mini- 
mal error. Thus, a set of codevectors within the code- 
book is selected and compared against vectors in the 
datacube to ensure that they "best" approximate each 
vector in some fashion. 

[0020] Once the codebook is generated, the vectors 
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within the datacube are encoded by replacing each vec- 
tor with an index of the best matching codevector within 
the codebook. For example, when a datacube has 
1 ,000,000 vectors and a codebook has 256 codevec- 
tors, the resulting compressed data requires memory for 
256vectors being the codebook and 1 ,000,000 bytes 
forming the entire array of indices. Even if the codevec- 
tors have 100 bytes each, the resulting encoded data 
requires just over a megabyte while the original data re- 
quired 100 megabytes. However, though the compres- 
sion ratio is high, the fidelity may be poor since the com- 
pression is lossy. Further, different codebooks will result 
in different - better or worse - fidelity as noted above. 
[0021] In order to overcome the shortcomings of the 
above method a recursive method for encoding hyper- 
spectral data is disclosed. Referring to Figs. 3, 4 and 5 ( 
a system and a method for encoding hyper-spectral data 
according to the invention is shown. The entire hyper- 
spectral datacube is provided via port 101 as the input 
to processor 103 for recursive processing as shown in 
Figs. 3 and 4. Memory 105, preferably RAM, is provided 
to store data during the recursive processing. 
[0022] First, a temporary codebook having a small 
number of codevectors is generated from the entire hy- 
per-spectral datacube. A codevector in the codebook 
corresponds to a cluster of spectral vectors and is the 
gravity center of the cluster. The codebook typically has 
less than 1 6 codevectors. Selection of such a codebook 
to accurately reflect the vectors of a given datacube is 
unlikely and as such, less care needs to be taken to en- 
sure high fidelity of the results in a given pass. Option- 
ally, sampling is performed on a small percentage of 
spectral vectors within the hyper-spectral data and a set 
of codevectors is determined for reasonably approxi- 
mating the sampled vectors. 

[0023] Next, the spectral vectors of the hyper-spectral 
datacube are encoded using the small codebook. A fi- 
delity is evaluated for each of the vectors which are en- 
coded by a codevector in the codebook. The fidelity is 
related to a closeness between the encoded vectors - 
real data - and the codevector with which the data was 
encoded. 

[0024] When a fidelity of a cluster of vectors encoded 
by a codevector is above a predetermined threshold, 
that codevector is sufficient and is stored in the final 
codebook. The vectors encoded by that codevector are 
replaced with an index indicative of the codevector's lo- 
cation within the final codebook. 
[0025] When a fidelity evaluation is below the prede- 
termined threshold, the cluster of spectral vectors is pro- 
vided to the recursive block for another iteration. The 
cluster of spectral vectors encoded by the codevector is 
then reencoded with a new small codebook containing 
a subset of codevectors, which is generated from that 
cluster of spectral vectors. The subset of codevectors 
may contain only one codevector or a plurality of code- 
vectors. The new small codebook results In Improved 
fidelity. The recursive process is continued until each 



■ codevector yields a fidelity above a predetermined 
threshold. This recursive process guaranties that an en- 
coded hyper-spectral datacube has a fidelity above the 
threshold and, furthermore, each encoded spectral vec- 
5 tor within the hyper-spectral datacube has a fidelity 
above the threshold. 

[0026] Once all clusters have a fidelity that is better 
than the predetermined fidelity, the recursion stops 
since no more clusters remain as inputs to the recursive 
10 block. At this point, the final codebook has a number of 
codevectors, each spectral vector is now associated 
with a codevector of the final codebook and the fidelity 
with which each codevector is associated with its cluster 
of vectors is known to be better than the predetermined 
15 fidelity. Each spectral vector is then replaced with an in- 
dex indicative of the codevector's location within the fi- 
nal codebook. The encoded hyper-spectral data - the 
indices of the encoded spectral vectors and the final 
codebook - are then provided to port 107 for transmis- 
20 sion. 

[0027] To generate a small codebook is not a labori- 
ous task since it is selected to best classify the vectors 
of the datacube into clusters and does not necessarily 
provide a "best" fidelity. Because the size of the code- 
25 book generated at each recursion is small, and the 
number of vectors in each cluster becomes smaller and 
smaller with increasing number of recursions, the proc- 
ess is very rapid and, therefore, advantageous. For ex- 
ample, this method allows processing of the data in near 
30 real time while mapping a surface area. Further, the re- 
sulting codebook is guaranteed to provide better than 
the predetermined fidelity for each encoded cluster and 
not only for the entire datacube. Thus, for each code- 
vector, it is known that the data is well selected. Finally, 
35 the recursive method is applicable regardless of the de- 
sired fidelity. For example, setting the compression fi- 
delity to a level better than that of the spectrometer with 
which the data is captured can result in a near lossless 
compression or a lossless compression in the sense of 
40 taking into account the instrument noise. Setting the pre- 
determined fidelity level to a worse fidelity results in in- 
creased compression at that fidelity level or better. 
Though higher values of predetermined fidelity result in 
more processing time because further recursions are re- 
45 quired, the increase is not so substantial such that the 
tradeoff is mainly between fidelity and compression ra- 
tio. 

[0028] Though the above method is described with re- 
lation to recursion, it could be implemented in an itera- 
50 tive fashion as well. This is evident to those of skill in 
the art. Also, instead of dealing with each codevector 
independently, it is also possible to group codevectors 
having insufficiently high fidelities together to form a 
subset. Each codevector is then replaced with a set of 
55 codevectors and a single encoding step is performed for 
the entire subset. Of course, since the codevectors are 
already representative of known sets of vectors, such a 
method is less efficient than the one described herein. 
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[0029] The compressed data - index map - produced 
by this compression technique actually is a classification 
of the hyper-spectral datacube. In addition the clusters 
in the index map are well ordered. Similar clusters have 
close class numbers. Moreover, controlling the fidelity 
of compression can be used to control the accuracy of 
the classification. That is a unique property of which a 
supervised and/or an unsupervised classification meth- 
od are not possessed. The processing time to compress 
a datacube using HSOCVQ is faster than that required 
for classifying the datacube using a supervised or an 
unsupervised classification method. The HSOCVQ is 
useful as an alternative classification method. 
[0030] When classification based on particular spec- 
tral channels is desired, the method with small modifi- 
cations supports such an application. For example, if 
classification is to be based on channels a0..ai as a first 
classification hierarchy and b0..bj as a second classifi- 
cation hierarchy - classify by a first and then by b within 
a - then the method is employed for only channels a0.. 
ai in a first stage. The first stage continues through iter- 
ations until classification fidelity is sufficient. Alternative- 
ly, only one iteration per classification hierarchy is per- 
formed. Because the method is being applied to only a 
subset of channels, it typically converges faster and ex- 
ecutes in less time. Once the first stage is completed, 
the channels b0..bj are added to the data for which the 
method is employed. Once again, the method typically 
converges rapidly for a small subset of channels. Once 
the second stage is completed, the remaining channels 
are also considered in accordance with the present 
method. 

[0031] Since only a subset of channels is considered 
in each of the first and second stages, the classification 
that results is heavily weighted to classify the data based 
on that subset. 

[0032] An exemplary use of hierarchical classification 
is as follows. When a spectrometrric instrument is 
known to have more error in data captured on some 
channels than on others whether because of external 
effects or apparatus limitations, those channels are se- 
lected to be in a lowest classification order such that 
classification is not heavily weighted by error data. Since 
much of the data reconstruction error will be focused in 
those channels evaluated last, such a method allows for 
compression of spectral channels with less error more 
accurately than those with substantial error. Further, if 
the channels representative of error data are known a 
priori, such a method invokes little or no penalty in per- 
formance or data reconstruction quality. In fact, it is likely 
to improve both. 

[0033] With this compression technique, small ob- 
jects and small clusters are well preserved after com- 
pression if an appropriate threshold is selected. Setting 
the compression fidelity to a level better than that of the 
instrument with which the data is captured results in a 
near lossless compression or a lossless compression in 
the sense of taking into account the Instrument noise. 



[0034] Referring to Figs. 6 and 7, another system and 
method for encoding hyper-spectral data according to 
the invention is shown. The hyper-spectral datacube is 
provided via port 201 as the input to processor 203 for 
5 recursive processing as shown in Fig. 6. Memory 205, 
preferably RAM, is provided to store data during the re- 
cursive processing. This method differs from the method 
disclosed above in that the fidelity of the encoded spec- 
tral vectors is not determined during the recursive proc- 
10 ess. Each cluster of spectral vectors encoded by a code- 
vector of the initial small codebook is reencoded with a 
new small codebook, which is generated from that clus- 
ter of spectral vectors. The new small codebook results 
in improved fidelity. This step is then repeated until a 
15 desired number of codevectors exist within each of the 
codebooks. After the recursive process is finished for all 
clusters a fidelity of the encoded hyper-spectral data is 
determined and subsets of codevectors associated with 
encoded hyperspectrai data having a fidelity above a 
20 predetermined fidelity are selected. Finally, the selected 
subsets of codevectors are stored in a final codebook, 
and each of the spectral vectors of the encoded hyper- 
spectral data is replaced with an index indicative of the 
codevector J s location within the final codebook. The en- 
25 coded hyper-spectral data - the indices of the encoded 
spectral vectors and the final codebook - are then pro- 
vided to port 207 for transmission. 
[0035] By avoiding the determination of the fidelity 
during the recursive process this method can be exe- 
30 cuted as a straightforward iteration and is, therefore 
faster than the above method. Of course,, because no 
fidelity is determined, this method fails to weight the en- 
coding process in order to achieve improved fidelity and, 
as such, is less preferred. 
35 [0036] Referring to Fig. 8, an embodiment of the in- 
vention using iteration instead of recursion is shown. 
Here, during each iteration at least one cluster is reen- 
coded. It is possible to reencode all clusters having a 
fidelity below the threshold fidelity or a subset of those 
40 clusters. During a subsequent iteration, a larger number 
of clusters exist and the process repeats itself. The iter- 
ations stop when convergence occurs or when another 
indicator is determined. For example, a maximum 
number of clusters is reached. Though the method is 
45 shown iteratively, it results in very similar results to those 
of the method of Fig. 3. 

[0037] According to another method, only a cluster 
with a worst fidelity is reencoded during each pass. 
Therefore, after each pass, all clusters are known to 
50 have better than the worst fidelity determined in a pre- 
vious pass. Such an algorithm converges toward an 
overall better fidelity with each pass and can be termi- 
nated after a number of iteration, when a worst fidelity 
is above a predetermined threshold, when a number of 
55 clusters is encoded with at least a predetermined fidelity 
and so forth. For example, if the method is stopped when 
4096 clusters exist, a codebook having 4096 codevec- 
tors result, wherein the codevectors represent the orig^ 
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ina! data with a fidelity of at least that of the worst fidelity 
for a cluster. Since the cluster encoded with a worst fi- 
delity is reencoded each pass, such a method avoids 
the problem of reencoding better clusters merely be- 
cause they are encoded with a fidelity below a given 
threshold. For example a cluster with a fidelity F below 
the fidelity Fthreshold is typically reencoded. If the proc- 
ess is halted when 4096 codevectors exist, there may 
remain clusters with fidelities of below F. If the cluster 
with a worst fidelity of encoding is reencoded first this is 
not the case. 

[0038] As is evident to those of skill in the art, the 
present invention does not rely on geometric clustering 
of spectra. Instead, clustering is performed using a clas- 
sification system to result in clusters of similar spectra. 
This results in error that is randomly distributed through- 
out the hyperspectral data and not spatially expressed 
therein. In contrast, most prior art systems rely on geo- 
metric subsampling and thereby often result in spatially 
expressed errors in the reconstructed data. Prior art sys- 
tems rely on classification of specific groups of spectral 
channels whereas the present invention relies on clas- 
sification of all spectral channels simultaneously when 
so desired. 

[0039] Alternatively or in conjunction, algorithms are 
employable which predict convergence of the method 
and error likelihood in the process. The use of predictive 
algorithms allows the a system to balance performance 
in terms of processing resource usage and performance 
based on compression ratio and performance based on 
compressed data fidelity. These three performance in- 
dicators are often difficult to balance in a lossy compres- . 
sion system. 

[0040] Though in the above description, the exact 
method of determining codevectors within a codebook 
is not disclosed, there are many such methods in the 
prior art. Any such method is applicable to the present 
invention. Preferably, such a method is applied in amore 
time efficientfashion that risks fidelity in orderto improve 
performance. Since fidelity will be improved in further 
stages, the fidelity of each individual stage is less sig- 
nificant than with the prior art. 

[0041] For example, a method of fast vector quanti- 
sation involves a step of training set sub-sampling. Such 
a method, when integrated into the present inventive 
method, results in significant advantages. It is generally 
known that to generate a given size codebook the Code- 
book Generation Time (CGT) is proportional to the size 
of the training set. The CGT at each stage of the present 
inventive method is improved by a factor of approxi- 
mately 1/sampling rate (SR), if the training set is sub- 
sampled at a rate of SR. For example, if the training set 
consists of only 2% of the vectors within the datacube, 
the CGT is improved by a factor of approximately 50. 
For an iterative or recursive method such as the present 
invention, the speed improvement is very significant 
since numerous codebooks are generated in the vector 
quantisation of a single datacube. 



[0042] Of course, since the method generates small 
codebooks representative of classifications, sampling of 
spectral vectors is also useful in the classification when 
spectral location is a significant factor in the classifica- 
5 tion process. 

[0043] The overall processing time of the present al- 
gorithm with sub-sampling is comparable to the fastest 
Multiple Sub-Codebook Algorithm (MSCA). The 
processing times are similar. Since for both algorithms 
10 the codebook generation time decreases when the 
training set size and the codebook size are reduced, this 
is expected. Further, the coding time is reduced when 
the codebook size is decreased. The distinguishing fea- 
ture of the present invention is a resulting improved fi- 
rs delity, improved compression ratio, and an ability to se- 
lectively trade-off between these two measures of per- 
formance. 

[0044] Of course, the present invention will also func- 
tion with any of a variety of methods of codebook gen- 
20 eration such as the LBG method, the Spectral Feature 
Based Binary Coding method, and MSCA. 
[0045] Also, during an encoding step, the SFBBC 
method and the Correlation Vector Quantization (CVQ) 
method are suitable alternatives to the LBG algorithm. 
25 When using MSCA, it is suitable for replacing both the 
codebook generation and the encoding steps. 
[0046] Selection of codebook generation techniques 
and of encoding techniques within the framework of the 
inventive method is a matter for one skilled in the art 
30 based on design parameters relating to system perform- 
ance and based on their available working program 
code. For example, someone with working code for per- 
forming MSCA may choose to use MSCA for both steps 
to obviate a need to further develop encoding and code- 
35 book selection code, thereby speeding up time to mar- 
ket. Of course, numerous parameters effect selection of 
techniques within the inventive method. This is evident 
to those of skill in the art. 

[0047] According to an embodiment of the invention 
40 a fixed fidelity is provided at an outset of the compres- 
sion process. For example, the fidelity is provided as a 
signal to noise ratio. The compression process then pro- 
ceeds through a sufficient number of stages until the re- 
construction fidelity for each codevector reaches the 
45 predetermined fixed fidelity. In one embodiment the al- 
gorithm adaptively selects the codebook size at each 
approximate stage to yield a best compression ratio and 
fastest processing time. Alternatively, a same codebook 
size is used for each stage. 
so [0048] Lossy compression can be treated as lossless 
or near-lossless compression from the point of view of 
applications, if the level of noise (error) introduced by a 
lossy compressor is smaller than that of the intrinsic 
noise of the original data. In using the Fixed Fidelity 
55 mode, one can implement near lossless compression 
by setting the value of the fidelity threshold to be slightly 
less than the signal-to-noise ratio of the original data. 
[0049] In another embodiment, an operating mode is 
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used wherein the additional compression vs. fidelity is 
improved at each iteration to determine if another itera- 
tion will improve the fidelity for a cluster sufficiently to 
be warranted. Despite an increasing number of approx- 
imation stages - classification processes, after a certain 
number of stages the reconstruction fidelity increase is 
very limited when compared to a decrease in the com- 
pression ratio. This point, which is referred to as an in- 
flection point of the graph is detected, and compression 
proceeds automatically stage by stage until the inflec- 
tion point is detected. In an additional embodiment the 
process adaptively selects codebook size at each stage 
to yield a best compression ratio and fastest processing 
time. Clearly, using such a process allows for a careful 
balance between compression ratio and fidelity. 
[0050] For some data sets to be compressed, hierar- 
chical self-orgnizing cluster vector quantisation produc- 
es many smail-size clusters in the hierarchical self-or- 
ganizing cluster extension process in order to provide a 
fidelity of the reconstructed data for each spectral clus- 
ter better than the desired fidelity. In hierarchical self- 
organizing cluster vector quantisation, a cluster corre- 
sponds to a codevector, which is, statistically speaking, 
the centre of gravity of the cluster. The spectral vectors 
in that cluster are all encoded by a same codevector. 
The larger the size of a cluster, the higher the compres- 
sion ratio of the cluster since the larger group of vectors 
is represented by a single codevector. The overall com- 
pression ratio decreases when there are many small- 
size clusters. An approach of merging small-size similar 
clusters into a larger cluster is used in an embodiment 
of hierarchical self-organizing cluster vector quantisa- 
tion. A cluster is treated as a small-size cluster if its size 
is smaller than a pre-set value. Alternatively, it is treated 
as a small cluster when its compression ratio is below 
a predetermined value. Further alternatively, when the 
overall compression ratio is below a predetermined val- 
ue, the smallest clusters are labeled a small-size cluster. 
All small-size clusters are examined during the com- 
pression process. Similar small-size clusters are 
merged into a single larger cluster before a step of de- 
coding is performed. Thus two small-sized clusters may 
be encoded by a same codevector. Only unique small- 
size clusters are not merged and are transmitted. This 
approach significantly increases the compression ratio 
for applications of the process resulting in many small- 
size clusters due to early classifications that separated 
similar vectors. Often, the process of merging small-size 
clusters results in little or no fidelity penalty. Further, if 
the merger process is conducted when clusters get 
small but before the iterative process is complete, the 
merger process merely regroups vectors for classifica- 
tion in a later stage and does not add any additional er- 
ror. 

[0051] Of course, numerous other embodiments may 
be envisaged without departing from the spirit and 
scope of the claimed invention. 



Claims 

1 . A method for encoding a hyper-spectral image da- 
tacube using vector quantisation, wherein the en- 
5 coded hyper-spectral data is compressed data, the 
method comprising the steps of: 

a) determining a codebook having a plurality of 
codevectors; 

10 b) encoding each spectral vector of the hyper- 

spectral data by determining a codevector with- 
in the codebook that approximates the spectral 
vector within the hyper-spectral data; 

c) determining a fidelity of a cluster of the spec- 
is tral vectors encoded with a subset of at least a 

codevector; and, 

d) when the fidelity of a cluster of spectral vec- 
tors encoded by the subset of at least a code- 
vector is below a predetermined fidelity, deter- 

20 mining afurthercodebook relating to the subset 

of the at least a codevector, encoding each 
spectral vector of the cluster of vectors encod- 
ed by at least a codevector within the subset 
using the further codebook, and returning to 
25 step (c). 

2. A method for encoding a hyper-spectra! image da- 
tacube using vector quantisation as defined in claim 

1 wherein the subset of at least a codevector coh- 
30 sists of a single codevector. 

3. A method for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 

2 wherein the cluster of spectral vectors includes 
35 alispectral vectors encoded by the single codevec- 
tor. 



4. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 3, comprising the steps 
40 of: 



when the fidelity of spectral vectors encoded by 
a codevector is above a predetermined fidelity, 
storing that codevector in a final codebook, and 
storingwithin an index map an index indicative 
of the codevector's location in the final code- 
book. 



45 



50 



55 6. 



A method for encoding a hyper-spectral image da- 
tacube as defined in claim 4, wherein the codevec- 
tors within the further codebook are determined 
based on vectors associated with the at least a 
codevector in the subset. 

A method for encoding a hyper-spectral image da- 
tacube as defined in claim 1 , comprising the steps 
of: 
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when the fidelity of spectral vectors encoded by 
a codevector is above a predetermined fidelity, 
storing that codevector in a final codebook, and 
storing within an index map an index indicative 
of the codevector's location within the final 
codebook. 

7. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 1 , wherein the codevec- 
tors within the further codebook are determined 
based on vectors associated with the at least a 
codevector in the subset. 

8. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 1 , wherein step a) com- 
prises the steps of sampling a subset of vectors 
within the hyper-spectral data and determining a set 
of codevectors for reasonably approximating the 
sampled vectors. 

9. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 8, wherein sampling is 
performed on a small percentage of vectors within 
the hyper-spectral data. 

10. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 9, wherein the set of 
codevectors consists of fewer codevectors than is 
required to encode the datacube with the predeter- 
mined fidelity. 

11. A method for encoding a hyper-spectrai image da- 
tacube as defined in claim 6, wherein the codebook 
and the index map form a classification of the hyper- 
spectral datacube and, wherein the accuracy of the 
classification is controllable though selection of the 

' predetermined fidelity. 

12. A method for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
1 comprising the step of: 

predicting a convergence of the encoding steps 
to determine a condition for stopping execution 
of the steps. 

13. A method for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
1 comprising the step of: 

predicting a convergence of the encoding steps 
to determine an approximate amount of re- 
sources for performing the encoding. 

14. A method for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
1 comprising the steps of: 



selecting a group of spectral vectors from the 
hyperspectral data as the plurality of codevec- 
tors; and, 

at Intervals, adding additional spectral vectors 
5 from the hyperspectral data to the plurality of 

codevectors. 

15. A method for encoding image data using vector 
quantisation as defined in claim 1, wherein the fur- 
to ther codebook is of a size different from the code- 
book. 

16. A method for encoding image data using vector 
quantisation as defined in claim 1 , wherein at least 

is one of steps (a) and determining a further codebook 
are performed by at least one of the following: LBG, 
SFBBC, and MSCA. 

17. A method for encoding image data using vector 
20 quantisation as defined in claim 1 6, wherein at least 

one of steps (b) and encoding each spectral vector 
of the cluster of vectors encoded by at least a code- 
vector within the subset using the further codebook 
are performed by at least one of the following: CVQ , 
25 SFBBC, and MSCA. 

18. A method for encoding image data using vector 
quantisation as defined in claim 1 , wherein upon en- 
coding each spectral vector of the cluster of vectors 

30 encoded by at least a codevector within the subset 
using the further codebook a comparison is made 
between an improvement in fidelity for the cluster 
and a compression ratio for the cluster to provide a 
comparison result wherein the method is halted 
35 when the comparison results are below a threshold 
improvement. 



19. A method for encoding image data using vector 
quantisation as defined in claim 1, comprising the 
40 step of: 



when a cluster has fewer vectors than a prede- 
termined number of vectors, determining an- 
other cluster similar to the cluster and merging 
the two clusters to form one larger cluster. 
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20. A method for encoding image data using vector 
quantisation as defined in claim 1, comprising the 
step of: 

when a compression ratio of the overall da- 
tacube is less than a predetermined compres- 
sion ratio determining a cluster having approx- 
imately a fewest vectors, determining another 
cluster similar to the subset, and merging the 
two clusters to form one larger cluster. 

21. A method for encoding a hyper-spectral image da- 
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tacube using vector quantisation, wherein the en- 
coded hyper-spectral data is compressed data, the 
method comprising the steps of: 

a) determining a plurality of codevectors; 

b) encoding each spectral vector of the hyper- 
spectral data by determining a codevector with- 
in the plurality of codevectors that approxi- 
mates the spectral vector within the hyper- 
spectral data; and, 

c) repeating the steps of: 

determining another plurality of codevec- 
tors relating to a subset of codevectors 
within the previously determined code- 
book, encoding each spectral vector within 
the hyper-spectral data that is associated 
with codevectors within the subset using 
the other plurality of codevectors, 
until a desired number of codevectors ex- 
ist. 

22. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 21 wherein the codevec- 
tors are stored within a final codebook and wherein 
encoding of the codevectors is performed accord- 
ing to the step of: 

storing within an index map an index indicative 
of the codevectorwithin the final codebook with 
which a vector is encoded. 

23. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 22, comprising the steps 
of: 

determining a fidelity of subsets of the encoded 
hyper-spectral data; 

selecting subsets of codevectors associated 
with encoded hyperspectral data having a fidel- 
ity above a predetermined fidelity. 



with encoded hyperspectral data having a fidel- 
ity below a predetermined fidelity and perfom- 
ring step (c) thereon. 

5 26. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 21 , wherein the step (c) 
comprises the steps of: 

determining a fidelity of the subset of the en- 

10 coded hyper-spectral data; 

when the fidelity is above a predetermined fi- 
delity, selecting another different subset; 
when the fidelity is below the predetermined fi- 
delity performing the step of determining anoth- 

15 er plurality of codevectors relating to a subset 

of codevectors within the previously deter- 
mined codebook, encoding each spectral vec- 
tor within the hyper-spectral data that is asso- 
ciated with codevectors within the subset using 

20 the other plurality of codevectors. 



27. A method for encoding image data using vector 
quantisation as defined in claim 21 , wherein each 
subset of codevectors includes a single codevector 

25 and each singia codevector has an associated set 
of spectral vectors encoded thereby. 

28. A method for encoding image data using vector 
quantisation as defined in claim 27, comprising the 

30 step of: 



when a set of spectral vectors has fewer vec- 
tors than a predetermined number of vectors, 
determining another set of spectral vectors sim- 
ilar to the set of spectral vectors and merging 
the two sets of spectral vectors to form one larg- 
er set of spectral vectors. 
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24. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 23, comprising the steps 
of: 45 

storing the selected subsets of codevectors in 
a final codebook, and storingwithin the index 
map an index indicative of the codevector's lo- 
cation within the final codebook. 50 



29. A method for encoding image data using vector 
quantisation as defined in claim 21 , comprising the 
step of: 

when a compression ratio of the overall da- 
tacube is less than a predetermined compres- 
sion ratio determining a set of spectral vectors 
having approximately a fewest vectors, deter- 
mining another set of spectral vectors similar to 
the plurality of codevectors, and merging the 
two sets of spectral vectors to form one larger 
set of spectral vectors. 



25. A method for encoding a hyper-spectral image da- 
tacube as defined in claim 26, comprising the steps 
of: 



30. A system for encoding a hyper-spectral image da- 
tacube using vector quantisation, wherein the en- 
coded hyper-spectral data is compressed data, the 
system comprising: 

a first port for receiving the hyper-spectral im- 
age data; 



55 

determining a fidelity of subsets of the encoded 
hyper-spectral data; 

selecting subsets of codevectors associated 
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a processor for: 

a) determining a codebook having a plural- 
ity of codevectors; 

b) encoding each spectral vector of the hy- 
per-spectral data by determining a code- 
vector within the codebook that approxi- 
mates the spectral vector within the hyper- 
spectral data; 

c) determining a fidelity of a cluster of vec- 
tors within the encoded hyper-spectral da- 
ta; and, 

d) when the fidelity of a cluster of spectral 
vectors encoded by a codevector is below 
a predetermined fidelity, determining an- 
other codebook based on the cluster of 
spectral vectors, encoding each spectral 
vector of the cluster of spectral vectors 
within the hyper-spectral data that is asso- 
ciated with codevectors within the subset 
using the other codebook, and returning to 
step (c); 

memory for storing data during execution of 
steps a) to d); and, 

a second port for providing the encoded hyper- 
spectral data. 

31. A system for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
30, wherein the processor executes the steps of: 

when the fidelity of spectral vectors encoded by 
a codevector is above a predetermined fidelity, 
storing that codevector in a final codebook, and 
storing in the index map at locations associated 
with each vector encoded by the codevector an 
index indicative of the codevetor address within 
the final codebook. 

32. A system for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
30, wherein the memory comprises RAM. 

33. A system for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
30, wherein the hyper-spectral image data are en- 
coded in real time. 



a first port for receiving the hyper-spectral im- 
age data; 
a processor for: 

a) determining a plurality of codevectors; 

b) encoding each spectral vector of the hy- 
per-spectral data by determining a code- 
vector within the plurality of codevectors 
that approximates the spectral vector with- 
in the hyper-spectral data; and, 

c) repeating the steps of: 

determining another plurality of code- 
vectors based on spectral vectors 
within the hyper-spectral data and as- 
sociated with a codevector, encoding 
each spectral vector within the hyper- 
spectral data that is associated with 
the codevector using the other plurality 
of codevectors, 

until a desired number of codevectors 
exist within each of the pluralities of 
codevectors; 

memory for storing data during execution of 
steps (a) to (c); and, 

a second port for providing the encoded hyper- 
spectral data. 

36. A system for encoding a hyper-spectral image da- 
tacube using vector quantisation as defined in claim 
35, wherein the processor executes the steps of: 

determining a fidelity of the encoded hyper- 
spectral data; 

selecting subsets of codevectors associated 
with encoded hyperspectral data having a fidel- 
ity above a predetermined fidelity; 
storing the selected subsets of codevectors in 
a codebook; and, 

storing within an index map indices indicative 
of the codevector's location within the code- 
book. 
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34. A system for encoding a hyper-spectral image da- so 
tacube using vector quantisation as defined in claim 
30, wherein the first port and the second port are a 
same port. 



35. A system for encoding a hyper-spectral image da- 55 
tacube using vector quantisation, wherein the en- 
coded hyper-spectral data is compressed data, the 
system comprising: 
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IMAGE DATA 



j a) determining a first codebook havi 


ng a plurality of codevectors j 






b) encoding each image vector of th 
codevector within the first codebook 
within the image data 


e image data by determining a 

; that approximates the image vector 



Figure 4 



c) creating a first index map by replacing the image vectors with an 
index indicative of the codevector's location within the first codebook 



d) determining difference data based on the original image data and the 
encoded image data 



el) determining another codebook 



e2) encoding each error vector of th( 
codevector within the other codeboc 
within the difference data 


5 difference data by determining a 
k that approximates the error vector 


1 


r 



eo; creating ouuuici liuca map i^bvu. 6 mv. w.*w» - — — 

index indicative of the codevector's location within the other codebook 



Repeat until a control error 
of the difference data is 
smaller than a given 
threshold 



e4) determining new difference data based on the original image data, 
the encoded image data and the encoded difference data 
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a) determining a first codebook having a plurality of codeveciors; 

b) encoding each image vector of the image data by determining a codevector 



within the first codebook that approximates the image vector within the image data: 

c) creating a first index map by replacing the image vectors with an index 
indicative of the codevector's location within the first codebook; 

d) determining difference data based on the original image data and the encoded 



e) repeating the steps of: 

el) determining another codebook; 

e2) encoding each error vector of the difference data by determining a 
codevector within the other codebook that approximates the error vector 
within the difference data; 

e3) creating another index map by replacing the error vectors with an 
index indicative of the codevector's location within the other codebook; 
and, 

e4) determining new difference data based on the difference data and the 
encoded difference data; 
until a control error of the difference data is smaller than a given threshold; 



image data; and. 
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ENCODED IMAGE DATA 



Figure 5 
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a) receiving from a first station connected to the 
communications network first data indicative of a first 
codebook and a first index map 



Figure 6 



b) transmitting the first data via the communications network 
from the first station to a second station 



cl) receiving data indicative 
a consecutive index map 


of a consecutive codebook and 




r 


c2) transmitting the data via 
from the first station to the s< 


the communications network 
^cond station 



Repeat until a fidelity of 
an image reconstructed 
from the transmitted data 
at the second station is 
above a predetermined 
threshold or until all 
codebooks and all index 
maps of the encoded 
image data have been 
transmitted 
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Figure 7 



a) decoding each encoded image vector of the image data 
using a first codebook and a first index map 






b) reconstructing first image 
image vectors 


data based on the decoded 
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cl) decoding each encoded error vector of consecutive _ 
error data using a consecutive codebook and a consecutive 
inHp.Y map 




* 


c2) reconstructing image dat 
and the decoded error vector 


a based on the first image data 
s of consecutive error data 



Repeat until a fidelity 
of the reconstructed 
image data is above a 
predetermined 
threshold or until all 
codebooks and all 
index maps of the 
encoded image data 
have been decoded 



RECONSTRUCTED IMAGE DATA 
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I) decoding each encoded image vector of the image data using a 



first codebook and a first index map; 

b) reconstructing first image data based on the decoded image 
vectors; and, 

c) repeating the steps of: 

cl) decoding each encoded error vector of consecutive 
difference data using a consecutive codebook and a 
consecutive index map; and, 

c2) reconstructing image data based on the first image 
data and the decoded error vectors of consecutive error 
data; 

until a fidelity of the reconstructed image data is above a 
predetermined threshold or until all codebooks and all index 
maps of the encoded image data have been decoded; 
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